SPARK-3568 [mllib] add ranking metrics #2667

coderxiang · 2014-10-06T05:34:33Z

Add common metrics for ranking algorithms (http://www-nlp.stanford.edu/IR-book/), including:

Mean Average Precision
Precision@n: top-n precision
Discounted cumulative gain (DCG) and NDCG

The following methods and the corresponding tests are implemented:

class RankingMetrics[T](predictionAndLabels: RDD[(Array[T], Array[T])]) {
  /* Returns the precsion@k for each query */
  lazy val precAtK: RDD[Array[Double]]

  /**
   * @param k the position to compute the truncated precision
   * @return the average precision at the first k ranking positions
   */
  def precision(k: Int): Double

  /* Returns the average precision for each query */
  lazy val avePrec: RDD[Double]

  /*Returns the mean average precision (MAP) of all the queries*/
  lazy val meanAvePrec: Double

  /*Returns the normalized discounted cumulative gain for each query */
  lazy val ndcgAtK: RDD[Array[Double]]

  /**
   * @param k the position to compute the truncated ndcg
   * @return the average ndcg at the first k ranking positions
   */
  def ndcg(k: Int): Double
}

SparkQA · 2014-10-06T05:39:36Z

QA tests have started for PR 2667 at commit 3a5a6ff.

This patch merges cleanly.

SparkQA · 2014-10-06T06:48:33Z

QA tests have finished for PR 2667 at commit 3a5a6ff.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])])
- case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)
- case class UncacheTableCommand(tableName: String) extends Command
- case class CacheTableCommand(
- case class UncacheTableCommand(tableName: String) extends LeafNode with Command
- case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(

AmplabJenkins · 2014-10-06T06:48:36Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21318/Test PASSed.

srowen · 2014-10-06T09:14:52Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala

+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs.


The inputs are really ranks, right? would this not be more natural as Int then?
I might have expected that the inputs were instead predicted and ground truth "scores" instead, in which case Double makes sense. But then the methods need to convert to rankings.

@srowen Thanks for all the comments! The current implementation only considers binary relevance, meaning the input is just labels instead of scores. It is true that Int is enough. IMHO, using Double is compatible with the current mllib setting and could be extended to deal with the score setting.

Hm, are these IDs of some kind then? the example makes them look like rankings, but maybe that's coincidence. They shouldn't be labels, right? Because the resulting set would almost always be {0.0,1.0}.

@srowen Take the first test case for example, (Array(1, 6, 2, 7, 8, 3, 9, 10, 4, 5), Array(1 to 5)), this means for this single query, the ideal ranking algorithm should return Document 1 to Document 5. However, the ranking algorithm returns 10 documents, with IDs (1, 6, .....). This setting right now only works for binary relevance.

Yes, I get it. But these could also just as easily be Strings maybe? like document IDs? anything you can put into a set and look for. Could this even be generified to accept any type? At least, Double seemed like the least likely type for an ID. It's not even what MLlib overloads to mean "ID"; for example ALS assumes Int IDs.

+1 on @srowen 's suggestion. We can use a generic type here with a ClassTag.

... and if the code is just going to turn the arguments into Set then it need not be an RDD of Array, but something more generic?

@srowen We can optimize the implementation later, if it is worth doing, and using Array may help. Usually, both predictions and labels are small. Scanning it sequentially may be faster than toSeq and contains. Array is also consistent across Java, Scala, and Python, and storage efficient for primitive types.

SparkQA · 2014-10-06T16:34:33Z

QA tests have started for PR 2667 at commit e443fee.

This patch merges cleanly.

SparkQA · 2014-10-06T17:43:09Z

QA tests have finished for PR 2667 at commit e443fee.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])])

AmplabJenkins · 2014-10-06T17:43:12Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21333/Test PASSed.

mengxr · 2014-10-07T22:36:34Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala

+  /**
+   * Returns the precsion@k for each query
+   */
+  lazy val precAtK: RDD[Array[Double]] = predictionAndLabels.map {case (pred, lab)=>


Returning an RDD may not be useful in evaluation. Usually people look for scalar metrics. prec@k and ndge@k on a test dataset usually mean the average, but not individual ones.

Should it be private? Think about what users call it for. Even we make it public, users may still need some aggregate metrics out of it. For the first version, I think it is safe to provide only the following:

precisionAt(k: Int): Double

ndcgAt(k: Int): Double

meanAveragePrecision

{case (pred, lab)=> -> { case (pred, lab) =>``(spaces after{and)`)

btw, I think it is better to use precisionAt instead of precision to emphasize that the parameter is a position. We use precision(class) in MulticlassMetrics, and users may get confused if we use the same name.

mengxr · 2014-10-07T22:40:29Z

@coderxiang k should be a configurable parameter in ranking metrics, e.g., prec@1 and prec@5. The API could be

def precision(k: Int): Double
def ndcg(k: Int): Double

srowen · 2014-10-08T05:51:59Z

@mengxr @coderxiang Hm come to think of it, doesn't this duplicate a lot of the purpose of https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala ? It also computes precision and recall, and supports multi-class classification instead of the binary case only. It feels like these two should be rationalized and not have two unrelated implementations.

mengxr · 2014-10-08T07:35:28Z

@srowen Ranking metrics are different from multiclass metrics. In general, multiclass metrics do not consider the ordering of the predictions, but just hits and misses, and they don't truncate the result. Ranking metrics consider the ordering of predictions, applying discounts and truncating the result. I think they are two sets of evaluation metrics.

srowen · 2014-10-08T09:52:05Z

@mengxr Yes I understand these metrics. Precision / recall are binary classifier metrics at heart (but not nDCG for example). Precision@k needs ranking. That's why precision/recall turn up in MulticlassMetrics as a special case -- you can compute precision of class-i vs not-class-i.

I understand it is two views on the same metric, and these are collections of different metrics. Hm, maybe this makes more sense if the metrics here are clearly precision@k, recall@k, with an argument k. The other class has F-measure, true positive rate, etc. I suppose those could be implemented here too for consistency, but don't know how useful they are.

Meh, anything to rationalize the purposeful difference between these two would help. Maybe it's just me.

mengxr · 2014-10-08T21:31:31Z

@srowen Making k explicit is what I suggested to @coderxiang . This should clearly tell the difference from other metrics. F-measure may not be useful here, nor recall. Let's focus on ndcg@k, prec@k, and MAP in this PR. @coderxiang Does it sound good to you?

AmplabJenkins · 2014-10-09T20:32:35Z

Can one of the admins verify this patch?

coderxiang · 2014-10-10T18:59:24Z

@mengxr @srowen Sorry for late response. In the update, I added two new members that support computing prec@k and ndcg@k, as suggested by @mengxr . Specifically, if k is too large, prec@k will be calculated as normal while ndcg@k will use the last computed value, as used in this paper. We can further discuss whether this is the best option.

SparkQA · 2014-10-10T18:59:37Z

QA tests have started for PR 2667 at commit 5f87bce.

This patch merges cleanly.

SparkQA · 2014-10-10T20:02:41Z

QA tests have finished for PR 2667 at commit 5f87bce.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])])

AmplabJenkins · 2014-10-10T20:02:45Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21600/Test PASSed.

mengxr · 2014-10-15T06:07:08Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala

+import org.apache.spark.rdd.RDD
+
+
+


remove extra empty lines

SparkQA · 2014-10-20T16:49:46Z

QA tests have started for PR 2667 at commit be6645e.

This patch merges cleanly.

SparkQA · 2014-10-20T17:48:07Z

QA tests have finished for PR 2667 at commit d64c120.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])])

AmplabJenkins · 2014-10-20T17:48:10Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21920/
Test PASSed.

SparkQA · 2014-10-20T17:57:22Z

QA tests have finished for PR 2667 at commit be6645e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])])

AmplabJenkins · 2014-10-20T17:57:26Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21921/
Test PASSed.

mengxr · 2014-10-21T05:20:11Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala

+   * Compute the average precision of all the queries, truncated at ranking position k.
+   *
+   * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be
+   * computed as #(relevant items retrived) / k. This formula also applies when the size of the


retrived -> retrieved

SparkQA · 2014-10-21T18:19:46Z

QA tests have started for PR 2667 at commit f113ee1.

This patch merges cleanly.

AmplabJenkins · 2014-10-21T18:37:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21998/
Test FAILed.

SparkQA · 2014-10-21T19:30:49Z

QA tests have finished for PR 2667 at commit f113ee1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait Combiner[M, C]
- trait Aggregator[V, A]
- class DefaultCombiner[M: Manifest] extends Combiner[M, Array[M]] with Serializable
- trait Vertex
- trait Message[K]
- public class TaskContext implements Serializable
- abstract class JavaSparkContextVarargsWorkaround
- public class StorageLevels
- class Sorter<K, Buffer>
- trait SparkHadoopMapRedUtil
- trait SparkHadoopMapReduceUtil
- class Accumulable[R, T] (
- trait AccumulableParam[R, T] extends Serializable
- class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
- trait AccumulatorParam[T] extends AccumulableParam[T, T]
- case class Aggregator[K, V, C] (
- abstract class Dependency[T] extends Serializable
- abstract class NarrowDependency[T](_rdd: RDD[T]) extends Dependency[T]
- class ShuffleDependency[K, V, C](
- class OneToOneDependency[T](rdd: RDD[T]) extends NarrowDependency[T](rdd)
- class RangeDependency[T](rdd: RDD[T], inStart: Int, outStart: Int, length: Int)
- trait FutureAction[T] extends Future[T]
- class ComplexFutureAction[T] extends FutureAction[T]
- class InterruptibleIterator[+T](val context: TaskContext, val delegate: Iterator[T])
- trait Logging
- trait Partition extends Serializable
- abstract class Partitioner extends Serializable
- class HashPartitioner(partitions: Int) extends Partitioner
- class RangePartitioner[K : Ordering : ClassTag, V](
- class SerializableWritable[T <: Writable](@transient var t: T) extends Serializable
- class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging
- class SparkContext(config: SparkConf) extends Logging
- * Smarter version of hadoopFile() that uses class tags to figure out the classes of keys,
- * Smarter version of hadoopFile() that uses class tags to figure out the classes of keys,
- * converters, but then we couldn't have an object for every subclass of Writable (you can't
- class SparkEnv (
- logWarning("The spark.cache.class property is no longer being used! Specify storage " +
- class SparkException(message: String, cause: Throwable)
- class SparkHadoopWriter(@transient jobConf: JobConf)
- sealed trait TaskFailedReason extends TaskEndReason
- case class FetchFailed(
- case class ExceptionFailure(
- "public class " + className + " implements java.io.Serializable
- class JavaDoubleRDD(val srdd: RDD[scala.Double]) extends JavaRDDLike[JDouble, JavaDoubleRDD]
- class JavaHadoopRDD[K, V](rdd: HadoopRDD[K, V])
- class JavaNewHadoopRDD[K, V](rdd: NewHadoopRDD[K, V])
- class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
- class JavaRDD[T](val rdd: RDD[T])(implicit val classTag: ClassTag[T])
- trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable
- class JavaSparkContext(val sc: SparkContext)
- trait Converter[T, + U] extends Serializable
- class WriterThread(env: SparkEnv, worker: Socket, split: Partition, context: TaskContext)
- class MonitorThread(env: SparkEnv, worker: Socket, context: TaskContext)
- class BytesToString extends org.apache.spark.api.java.function.Function[Array[Byte], String]
- class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor
- case class TestWritable(var str: String, var int: Int, var double: Double) extends Writable
- abstract class Broadcast[T: ClassTag](val id: Long) extends Serializable
- trait BroadcastFactory
- class HttpBroadcastFactory extends BroadcastFactory
- class TorrentBroadcastFactory extends BroadcastFactory
- case class RegisterWorker(
- case class ExecutorStateChanged(
- case class DriverStateChanged(
- case class WorkerSchedulerStateResponse(id: String, executors: List[ExecutorDescription],
- case class Heartbeat(workerId: String) extends DeployMessage
- case class RegisteredWorker(masterUrl: String, masterWebUiUrl: String) extends DeployMessage
- case class RegisterWorkerFailed(message: String) extends DeployMessage
- case class KillExecutor(masterUrl: String, appId: String, execId: Int) extends DeployMessage
- case class LaunchExecutor(
- case class LaunchDriver(driverId: String, driverDesc: DriverDescription) extends DeployMessage
- case class KillDriver(driverId: String) extends DeployMessage
- case class RegisterApplication(appDescription: ApplicationDescription)
- case class MasterChangeAcknowledged(appId: String)
- case class RegisteredApplication(appId: String, masterUrl: String) extends DeployMessage
- case class ExecutorAdded(id: Int, workerId: String, hostPort: String, cores: Int, memory: Int)
- case class ExecutorUpdated(id: Int, state: ExecutorState, message: Option[String],
- case class ApplicationRemoved(message: String)
- case class RequestSubmitDriver(driverDescription: DriverDescription) extends DeployMessage
- case class SubmitDriverResponse(success: Boolean, driverId: Option[String], message: String)
- case class RequestKillDriver(driverId: String) extends DeployMessage
- case class KillDriverResponse(driverId: String, success: Boolean, message: String)
- case class RequestDriverStatus(driverId: String) extends DeployMessage
- case class DriverStatusResponse(found: Boolean, state: Option[DriverState],
- case class MasterChanged(masterUrl: String, masterWebUiUrl: String)
- case class MasterStateResponse(host: String, port: Int, workers: Array[WorkerInfo],
- case class WorkerStateResponse(host: String, port: Int, workerId: String,
- class LocalSparkCluster(numWorkers: Int, coresPerWorker: Int, memoryPerWorker: Int)
- // Build up a PYTHONPATH that includes the Spark assembly JAR (where this class is), the
- class SparkHadoopUtil extends Logging
- * (4) the main class for the child
- println(s"Failed to load main class $childMainClass.")
- throw new IllegalStateException("The main method in the given main class must be static")
- SparkSubmit.printErrorAndExit("Cannot load main class from JAR: " + primaryResource)
- SparkSubmit.printErrorAndExit("No main class set in JAR; please specify one with --class")
- | --class CLASS_NAME Your application's main class (for Java / Scala apps).
- class ClientActor extends Actor with ActorLogReceive with Logging
- class TestListener extends AppClientListener with Logging
- class HistoryServer(
- | spark.history.provider Name of history provider class (defaults to
- class ApplicationSource(val application: ApplicationInfo) extends Source
- case class BeginRecovery(storedApps: Seq[ApplicationInfo], storedWorkers: Seq[WorkerInfo])
- case class WebUIPortResponse(webUIBoundPort: Int)
- class ZooKeeperPersistenceEngine(serialization: Serialization, conf: SparkConf)
- class MasterWebUI(val master: Master, requestedPort: Int)
- class WorkerWebUI(
- class TaskRunner(
- logInfo("Using REPL class URI: " + classUri)
- logInfo("Adding " + url + " to class loader")
- class TaskMetrics extends Serializable
- case class InputMetrics(readMethod: DataReadMethod.Value)
- class ShuffleReadMetrics extends Serializable
- class ShuffleWriteMetrics extends Serializable
- trait CompressionCodec
- class LZ4CompressionCodec(conf: SparkConf) extends CompressionCodec
- class LZFCompressionCodec(conf: SparkConf) extends CompressionCodec
- class SnappyCompressionCodec(conf: SparkConf) extends CompressionCodec
- case e: Exception => logError("Source class " + classPath + " cannot be instantiated", e)
- case e: Exception => logError("Sink class " + classPath + " cannot be instantialized", e)
- trait BlockDataManager
- trait BlockFetchingListener extends EventListener
- abstract class BlockTransferService
- sealed abstract class ManagedBuffer
- final class FileSegmentManagedBuffer(val file: File, val offset: Long, val length: Long)
- final class NioByteBufferManagedBuffer(buf: ByteBuffer) extends ManagedBuffer
- final class NettyByteBufManagedBuffer(buf: ByteBuf) extends ManagedBuffer
- class NettyConfig(conf: SparkConf)
- trait PathResolver
- trait BlockClientListener extends EventListener
- class BlockFetchingClient(factory: BlockFetchingClientFactory, hostname: String, port: Int)
- class BlockFetchingClientFactory(val conf: NettyConfig)
- class BlockFetchingClientHandler extends SimpleChannelInboundHandler[ByteBuf] with Logging
- class LazyInitIterator(createIterator: => Iterator[Any]) extends Iterator[Any]
- class ReferenceCountedBuffer(val underlying: ByteBuf) extends AnyVal
- class BlockHeader(val blockSize: Int, val blockId: String, val error: Option[String] = None)
- class BlockHeaderEncoder extends MessageToByteEncoder[BlockHeader]
- class BlockServer(conf: NettyConfig, dataProvider: BlockDataProvider) extends Logging
- class BlockServerChannelInitializer(dataProvider: BlockDataProvider)
- class BlockServerHandler(dataProvider: BlockDataProvider)
- class BlockMessageArray(var blockMessages: Seq[BlockMessage])
- class BufferMessage(id_ : Int, val buffers: ArrayBuffer[ByteBuffer], var ackId: Int)
- abstract class Connection(val channel: SocketChannel, val selector: Selector,
- class SendingConnection(val address: InetSocketAddress, selector_ : Selector,
- class Inbox()
- class MessageStatus(
- class MessageChunk(val header: MessageChunkHeader, val buffer: ByteBuffer)
- final class NioBlockTransferService(conf: SparkConf, securityManager: SecurityManager)
- class BoundedDouble(val mean: Double, val confidence: Double, val low: Double, val high: Double)
- class PartialResult[R](initialVal: R, isFinal: Boolean)
- class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends Serializable with Logging
- class BlockRDD[T: ClassTag](@transient sc: SparkContext, @transient val blockIds: Array[BlockId])
- class CartesianPartition(
- class CartesianRDD[T: ClassTag, U: ClassTag](
- class CheckpointRDD[T: ClassTag](sc: SparkContext, val checkpointPath: String)
- class CoGroupedRDD[K](@transient var rdds: Seq[RDD[_ <: Product2[K, _]]], part: Partitioner)
- val rnd = new scala.util.Random(7919) // keep this class deterministic
- class LocationIterator(prev: RDD[_]) extends Iterator[(String, Partition)]
- class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable
- class FlatMappedRDD[U: ClassTag, T: ClassTag](
- class FlatMappedValuesRDD[K, V, U](prev: RDD[_ <: Product2[K, V]], f: V => TraversableOnce[U])
- class HadoopRDD[K, V](
- class JdbcRDD[T: ClassTag](
- class MappedRDD[U: ClassTag, T: ClassTag](prev: RDD[T], f: T => U)
- class MappedValuesRDD[K, V, U](prev: RDD[_ <: Product2[K, V]], f: V => U)
- class NewHadoopRDD[K, V](
- class PairRDDFunctions[K, V](self: RDD[(K, V)])
- throw new SparkException("Output format class not set")
- throw new SparkException("Output key class not set")
- throw new SparkException("Output value class not set")
- class PartitionPruningRDD[T: ClassTag](
- class PartitionerAwareUnionRDDPartition(
- class PartitionerAwareUnionRDD[T: ClassTag](
- class PartitionwiseSampledRDDPartition(val prev: Partition, val seed: Long)
- class NotEqualsFileNameFilter(filterName: String) extends FilenameFilter
- abstract class RDD[T: ClassTag](
- class SampledRDDPartition(val prev: Partition, val seed: Int) extends Partition with Serializable
- class SequenceFileRDDFunctions[K <% Writable: ClassTag, V <% Writable : ClassTag](
- m => m.getReturnType().toString != "class java.lang.Object" &&
- class ShuffledRDD[K, V, C](
- class UnionRDD[T: ClassTag](
- class ZippedWithIndexRDDPartition(val prev: Partition, val startIndex: Long)
- class ZippedWithIndexRDD[T: ClassTag](@transient prev: RDD[T]) extends RDD[(T, Long)](prev)
- class AccumulableInfo (
- class DAGScheduler(
- case class BeginEvent(task: Task[_], taskInfo: TaskInfo) extends DAGSchedulerEvent
- case class GettingResultEvent(taskInfo: TaskInfo) extends DAGSchedulerEvent
- case class TaskSetFailed(taskSet: TaskSet, reason: String) extends DAGSchedulerEvent
- class ExecutorLossReason(val message: String)
- case class ExecutorExited(val exitCode: Int)
- case class SlaveLost(_message: String = "Slave lost")
- class InputFormatInfo(val configuration: Configuration, val inputFormatClazz: Class[_],
- class JobLogger(val user: String, val logDirName: String) extends SparkListener with Logging
- case class SparkListenerStageSubmitted(stageInfo: StageInfo, properties: Properties = null)
- case class SparkListenerStageCompleted(stageInfo: StageInfo) extends SparkListenerEvent
- case class SparkListenerTaskStart(stageId: Int, stageAttemptId: Int, taskInfo: TaskInfo)
- case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends SparkListenerEvent
- case class SparkListenerTaskEnd(
- case class SparkListenerJobStart(jobId: Int, stageIds: Seq[Int], properties: Properties = null)
- case class SparkListenerJobEnd(jobId: Int, jobResult: JobResult) extends SparkListenerEvent
- case class SparkListenerEnvironmentUpdate(environmentDetails: Map[String, Seq[(String, String)]])
- case class SparkListenerBlockManagerAdded(time: Long, blockManagerId: BlockManagerId, maxMem: Long)
- case class SparkListenerBlockManagerRemoved(time: Long, blockManagerId: BlockManagerId)
- case class SparkListenerUnpersistRDD(rddId: Int) extends SparkListenerEvent
- case class SparkListenerExecutorMetricsUpdate(
- case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,
- case class SparkListenerApplicationEnd(time: Long) extends SparkListenerEvent
- trait SparkListener
- class StatsReportListener extends SparkListener with Logging
- class SplitInfo(
- class StageInfo(
- class TaskInfo(
- case class IndirectTaskResult[T](blockId: BlockId) extends TaskResult[T] with Serializable
- class DirectTaskResult[T](var valueBytes: ByteBuffer, var accumUpdates: Map[Long, Any],
- case class WorkerOffer(executorId: String, host: String, cores: Int)
- case class LaunchTask(data: SerializableBuffer) extends CoarseGrainedClusterMessage
- case class KillTask(taskId: Long, executor: String, interruptThread: Boolean)
- case class RegisterExecutorFailed(message: String) extends CoarseGrainedClusterMessage
- case class RegisterExecutor(executorId: String, hostPort: String, cores: Int)
- case class StatusUpdate(executorId: String, taskId: Long, state: TaskState,
- case class RemoveExecutor(executorId: String, reason: String) extends CoarseGrainedClusterMessage
- case class AddWebUIFilter(filterName:String, filterParams: Map[String, String], proxyBase :String)
- class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: ActorSystem)
- class DriverActor(sparkProperties: Seq[(String, String)]) extends Actor with ActorLogReceive
- class JavaSerializer(conf: SparkConf) extends Serializer with Externalizable
- class KryoSerializer(conf: SparkConf)
- class KryoSerializationStream(kryo: Kryo, outStream: OutputStream) extends SerializationStream
- class KryoDeserializationStream(kryo: Kryo, inStream: InputStream) extends DeserializationStream
- trait KryoRegistrator
- // The class returned by asJavaIterable (scala.collection.convert.Wrappers$IterableWrapper).
- abstract class Serializer
- abstract class SerializerInstance
- abstract class SerializationStream
- abstract class DeserializationStream
- class FileShuffleBlockManager(conf: SparkConf)
- class IndexShuffleBlockManager extends ShuffleBlockManager
- trait ShuffleBlockManager
- case class BlockException(blockId: BlockId, message: String) extends Exception(message)
- sealed abstract class BlockId
- case class RDDBlockId(rddId: Int, splitIndex: Int) extends BlockId
- case class ShuffleBlockId(shuffleId: Int, mapId: Int, reduceId: Int) extends BlockId
- case class ShuffleDataBlockId(shuffleId: Int, mapId: Int, reduceId: Int) extends BlockId
- case class ShuffleIndexBlockId(shuffleId: Int, mapId: Int, reduceId: Int) extends BlockId
- case class BroadcastBlockId(broadcastId: Long, field: String = "") extends BlockId
- case class TaskResultBlockId(taskId: Long) extends BlockId
- case class StreamBlockId(streamId: Int, uniqueId: Long) extends BlockId
- class BlockManagerMaster(
- class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus: LiveListenerBus)
- case class BlockStatus(
- case class RemoveBlock(blockId: BlockId) extends ToBlockManagerSlave
- case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave
- case class RemoveShuffle(shuffleId: Int) extends ToBlockManagerSlave
- case class RemoveBroadcast(broadcastId: Long, removeFromDriver: Boolean = true)
- case class RegisterBlockManager(
- case class UpdateBlockInfo(
- case class GetLocations(blockId: BlockId) extends ToBlockManagerMaster
- case class GetLocationsMultipleBlockIds(blockIds: Array[BlockId]) extends ToBlockManagerMaster
- case class GetPeers(blockManagerId: BlockManagerId) extends ToBlockManagerMaster
- case class RemoveExecutor(execId: String) extends ToBlockManagerMaster
- case class GetBlockStatus(blockId: BlockId, askSlaves: Boolean = true)
- case class GetMatchingBlockIds(filter: BlockId => Boolean, askSlaves: Boolean = true)
- case class BlockManagerHeartbeat(blockManagerId: BlockManagerId) extends ToBlockManagerMaster
- class BlockManagerSlaveActor(
- class BlockNotFoundException(blockId: String) extends Exception(s"Block $blockId not found")
- class RDDInfo(
- final class ShuffleBlockFetcherIterator(
- class FetchRequest(val address: BlockManagerId, val blocks: Seq[(BlockId, Long)])
- class FetchResult(val blockId: BlockId, val size: Long, val deserialize: () => Iterator[Any])
- class StorageStatusListener extends SparkListener
- class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long)
- class ServletParams[T <% AnyRef](val responder: Responder[T],
- class EnvironmentListener extends SparkListener
- class ExecutorsListener(storageStatusListener: StorageStatusListener) extends SparkListener
- class JobProgressListener(conf: SparkConf) extends SparkListener with Logging
- class ExecutorSummary
- class StageUIData
- case class TaskUIData(
- class StorageListener(storageStatusListener: StorageStatusListener) extends SparkListener
- class ReturnStatementFinder extends ClassVisitor(ASM4)
- class FieldAccessFinder(output: Map[Class[_], Set[String]]) extends ClassVisitor(ASM4)
- abstract class CompletionIterator[ +A, +I <: Iterator[A]](sub: I) extends Iterator[A]
- * (dirPermissions) used when class was instantiated.
- case class MutablePair[@specialized(Int, Long, Double, Char, Boolean/* , AnyRef */) T1,
- class SerializableBuffer(@transient var buffer: ByteBuffer) extends Serializable
- "Non-primitive class " + cls + " passed to primitiveSize()")
- class StatCounter(values: TraversableOnce[Double]) extends Serializable
- trait TaskCompletionListener extends EventListener
- class TaskCompletionListenerException(errorMessages: Seq[String]) extends Exception
- class Vector(val elements: Array[Double]) extends Serializable
- class Multiplier(num: Double)
- class AppendOnlyMap[K, V](initialCapacity: Int = 64)
- class BitSet(numBits: Int) extends Serializable
- class ExternalAppendOnlyMap[K, V, C](
- class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag](
- // specialization to work (specialized class extends the non-specialized one and needs access
- class OpenHashSet[@specialized(Long, Int) T: ClassTag](
- // specialization to work (specialized class extends the non-specialized one and needs access
- sealed class Hasher[@specialized(Long, Int) T] extends Serializable
- class LongHasher extends Hasher[Long]
- class IntHasher extends Hasher[Int]
- class PrimitiveKeyOpenHashMap[@specialized(Long, Int) K: ClassTag,
- // specialization to work (specialized class extends the unspecialized one and needs access
- class PrimitiveVector[@specialized(Long, Int, Double) V: ClassTag](initialSize: Int = 64)
- case class Sample(size: Long, numUpdates: Long)
- class KVArraySortDataFormat[K, T <: AnyRef : ClassTag] extends SortDataFormat[K, Array[T]]
- class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream
- trait Pseudorandom
- trait RandomSampler[T, U] extends Pseudorandom with Cloneable with Serializable
- class BernoulliSampler[T](lb: Double, ub: Double, complement: Boolean = false)
- class PoissonSampler[T](mean: Double) extends RandomSampler[T, T]
- public class SimpleApp
- case class Person(name: String, age: Int)
- case class Person(name: String, age: Int)
- class UsageError(Exception):
- class SparkSink extends AbstractSink with Logging with Configurable
- class FlumeInputDStream[T: ClassTag](
- class SparkFlumeEvent() extends Externalizable
- class FlumeEventServer(receiver : FlumeReceiver) extends AvroSourceProtocol
- class FlumeReceiver(
- class CompressionChannelPipelineFactory extends ChannelPipelineFactory
- class MQTTInputDStream(
- class MQTTReceiver(
- class TwitterInputDStream(
- class TwitterReceiver(
- public final class JavaKinesisWordCountASL
- class GangliaSink(val property: Properties, val registry: MetricRegistry,
- class NonASCIICharacterChecker extends ScalariformChecker
- class SparkSpaceAfterCommentStartChecker extends ScalariformChecker
- class ThreadSafeFinalizer(object):
- class Finalizer(object):
- class JavaIterator(JavaObject):
- class JavaMap(JavaObject, MutableMapping):
- class JavaSet(JavaObject, MutableSet):
- class JavaArray(JavaObject, Sequence):
- class JavaList(JavaObject, MutableSequence):
- class SetConverter(object):
- class ListConverter(object):
- class MapConverter(object):
- class NullHandler(logging.Handler):
- :import_str: The class (e.g., java.util.List) or the package
- raise Py4JError("java_class must be a string, a JavaClass, or a JavaObject")
- class DummyRLock(object):
- class GatewayClient(object):
- class GatewayConnection(object):
- class JavaMember(object):
- class JavaObject(object):
- class JavaClass():
- class JavaPackage():
- class JVMView(object):
- class GatewayProperty(object):
- class JavaGateway(object):
- :rtype: A JVMView instance (same class as the gateway.jvm instance).
- class CallbackServer(object):
- class CallbackConnection(Thread):
- class PythonProxyPool(object):
- of small strings are exchanged (method name, class name, variable
- class Py4JError(Exception):
- class Py4JNetworkError(Py4JError):
- class Py4JJavaError(Py4JError):
- >>> class VectorAccumulatorParam(AccumulatorParam):
- class Accumulator(object):
- class AccumulatorParam(object):
- class AddingAccumulatorParam(AccumulatorParam):
- class PStatsParam(AccumulatorParam):
- class _UpdateRequestHandler(SocketServer.StreamRequestHandler):
- class AccumulatorServer(SocketServer.TCPServer):
- class Broadcast(object):
- class CloudPickler(pickle.Pickler):
- class Dummy(object):
- class SparkConf(object):
- class SparkContext(object):
- class SparkFiles(object):
- class EchoOutputThread(Thread):
- class LogisticRegressionModel(LinearModel):
- class LogisticRegressionWithSGD(object):
- class SVMModel(LinearModel):
- class SVMWithSGD(object):
- class NaiveBayesModel(object):
- - pi: vector of logs of class priors (dimension C)
- - theta: matrix of logs of class conditional probabilities (CxD)
- class NaiveBayes(object):
- class KMeansModel(object):
- class KMeans(object):
- class Vector(object):
- class DenseVector(Vector):
- class SparseVector(Vector):
- class Vectors(object):
- class Matrix(object):
- class DenseMatrix(Matrix):
- class RandomRDDs(object):
- class Rating(object):
- class MatrixFactorizationModel(object):
- class ALS(object):
- class LabeledPoint(object):
- class LinearModel(object):
- class LinearRegressionModelBase(LinearModel):
- class LinearRegressionModel(LinearRegressionModelBase):
- class LinearRegressionWithSGD(object):
- class LassoModel(LinearRegressionModelBase):
- class LassoWithSGD(object):
- class RidgeRegressionModel(LinearRegressionModelBase):
- class RidgeRegressionWithSGD(object):
- class MultivariateStatisticalSummary(object):
- class Statistics(object):
- class DecisionTreeModel(object):
- class DecisionTree(object):
- class MLUtils(object):
- class BoundedFloat(float):
- class RDD(object):
- class PipelinedRDD(RDD):
- class RDDSamplerBase(object):
- class RDDSampler(RDDSamplerBase):
- class RDDStratifiedSampler(RDDSamplerBase):
- class ResultIterable(collections.Iterable):
- class SpecialLengths(object):
- class Serializer(object):
- class FramedSerializer(Serializer):
- class BatchedSerializer(Serializer):
- class AutoBatchedSerializer(BatchedSerializer):
- class CartesianDeserializer(FramedSerializer):
- class PairDeserializer(CartesianDeserializer):
- class NoOpSerializer(FramedSerializer):
- class PickleSerializer(FramedSerializer):
- class CloudPickleSerializer(PickleSerializer):
- class MarshalSerializer(FramedSerializer):
- class AutoSerializer(FramedSerializer):
- class CompressedSerializer(FramedSerializer):
- class UTF8Deserializer(Serializer):
- class Aggregator(object):
- class SimpleAggregator(Aggregator):
- class Merger(object):
- class InMemoryMerger(Merger):
- class ExternalMerger(Merger):
- class ExternalSorter(object):
- class DataType(object):
- class PrimitiveTypeSingleton(type):
- class PrimitiveType(DataType):
- class StringType(PrimitiveType):
- class BinaryType(PrimitiveType):
- class BooleanType(PrimitiveType):
- class TimestampType(PrimitiveType):
- class DecimalType(PrimitiveType):
- class DoubleType(PrimitiveType):
- class FloatType(PrimitiveType):
- class ByteType(PrimitiveType):
- class IntegerType(PrimitiveType):
- class LongType(PrimitiveType):
- class ShortType(PrimitiveType):
- class ArrayType(DataType):
- class MapType(DataType):
- class StructField(DataType):
- class StructType(DataType):
- class Row(tuple):
- class SQLContext(object):
- class HiveContext(SQLContext):
- class LocalHiveContext(HiveContext):
- class TestHiveContext(HiveContext):
- class Row(tuple):
- class SchemaRDD(RDD):
- L
- class StatCounter(object):
- class StorageLevel(object):
- class SCCallSiteSync(object):
- class ExecutorClassLoader(conf: SparkConf, classUri: String, parent: ClassLoader,
- class ConstructorCleaner(className: String, cv: ClassVisitor)
- class SparkCommandLine(args: List[String], override val settings: Settings)
- trait SparkExprTyper extends Logging
- class SparkILoop(in0: Option[BufferedReader], protected val out: JPrintWriter,
- class IMainOps[T <: SparkIMain](val intp: T)
- class SparkILoopInterpreter extends SparkIMain(settings, out)
- cmd("javap", "<path|class>", "disassemble a file or class name", javapCommand),
- trait SparkILoopInit
- class SparkIMain(initialSettings: Settings, val out: JPrintWriter)
- val classServer = new HttpServer(outputDir, new SecurityManager(conf), classServerPort, "HTTP class server")
- implicit class ReplTypeOps(tp: Type)
- class ReadEvalPrint(val lineId: Int)
- class EvalException(msg: String, cause: Throwable) extends RuntimeException(msg, cause)
- class Request(val line: String, val trees: List[Tree])
- |class %s extends Serializable
- trait CodeAssembler[T]
- trait StrippingWriter
- trait TruncatingWriter
- abstract class StrippingTruncatingWriter(out: JPrintWriter)
- class ReplStrippingWriter(intp: SparkIMain) extends StrippingTruncatingWriter(intp.out)
- class ReplReporter(intp: SparkIMain) extends ConsoleReporter(intp.settings, null, new ReplStrippingWriter(intp))
- class SparkISettings(intp: SparkIMain) extends Logging
- trait SparkImports
- case class SparkComputedImports(prepend: String, append: String, access: String)
- case class ReqAndHandler(req: Request, handler: MemberHandler)
- code append "class %sC extends Serializable
- class SparkJLineCompletion(val intp: SparkIMain) extends Completion with CompletionOutput with Logging
- trait CompilerCompletion
- class TypeMemberCompletion(val tp: Type) extends CompletionAware
- class PackageCompletion(tp: Type) extends TypeMemberCompletion(tp)
- class LiteralCompletion(lit: Literal) extends TypeMemberCompletion(lit.value.tpe)
- class ImportCompletion(tp: Type) extends TypeMemberCompletion(tp)
- class JLineTabCompletion extends ScalaCompleter
- class SparkJLineReader(_completion: => Completion) extends InteractiveReader
- class JLineConsoleReader extends ConsoleReader with ConsoleReaderHelper
- class SparkJLineHistory extends JLineFileHistory
- trait SparkMemberHandlers
- sealed abstract class MemberDefHandler(override val member: MemberDef) extends MemberHandler(member)
- sealed abstract class MemberHandler(val member: Tree)
- class GenericHandler(member: Tree) extends MemberHandler(member)
- class ValHandler(member: ValDef) extends MemberDefHandler(member)
- class DefHandler(member: DefDef) extends MemberDefHandler(member)
- class AssignHandler(member: Assign) extends MemberHandler(member)
- class ModuleHandler(module: ModuleDef) extends MemberDefHandler(module)
- class ClassHandler(member: ClassDef) extends MemberDefHandler(member)
- class TypeAliasHandler(member: TypeDef) extends MemberDefHandler(member)
- class ImportHandler(imp: Import) extends MemberHandler(imp)
- class SparkRunnerSettings(error: String => Unit) extends Settings(error)
- case class Schema(dataType: DataType, nullable: Boolean)
- implicit class CaseClassRelation[A <: Product : TypeTag](data: Seq[A])
- class SqlParser extends StandardTokenParsers with PackratParsers
- protected case class Keyword(str: String)
- class SqlLexical(val keywords: Seq[String]) extends StdLexical
- case class FloatLit(chars: String) extends Token
- class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Boolean)
- trait Catalog
- class SimpleCatalog(val caseSensitive: Boolean) extends Catalog
- trait OverrideCatalog extends Catalog
- trait FunctionRegistry
- trait OverrideFunctionRegistry extends FunctionRegistry
- class SimpleFunctionRegistry extends FunctionRegistry
- trait HiveTypeCoercion
- trait MultiInstanceRelation
- class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, function: String) extends
- case class UnresolvedRelation(
- case class UnresolvedAttribute(name: String) extends Attribute with trees.LeafNode[Expression]
- case class UnresolvedFunction(name: String, children: Seq[Expression]) extends Expression
- case class Star(
- trait ImplicitOperators
- trait ExpressionConversions
- implicit class DslExpression(e: Expression) extends ImplicitOperators
- implicit class DslSymbol(sym: Symbol) extends ImplicitAttribute
- implicit class DslString(val s: String) extends ImplicitOperators
- abstract class ImplicitAttribute extends ImplicitOperators
- implicit class DslAttribute(a: AttributeReference)
- abstract class LogicalPlanFunctions
- implicit class DslLogicalPlan(val logicalPlan: LogicalPlan) extends LogicalPlanFunctions
- class TreeNodeException[TreeType <: TreeNode[_]](
- class AttributeMap[A](baseMap: Map[ExprId, (Attribute, A)])
- protected class AttributeEquals(val a: Attribute)
- case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
- case class Cast(child: Expression, dataType: DataType) extends UnaryExpression
- abstract class Expression extends TreeNode[Expression]
- abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
- abstract class LeafExpression extends Expression with trees.LeafNode[Expression]
- abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression]
- class InterpretedProjection(expressions: Seq[Expression]) extends Projection
- case class InterpretedMutableProjection(expressions: Seq[Expression]) extends MutableProjection
- class JoinedRow extends Row
- class JoinedRow2 extends Row
- class JoinedRow3 extends Row
- class JoinedRow4 extends Row
- class JoinedRow5 extends Row
- trait Row extends Seq[Any] with Serializable
- trait MutableRow extends Row
- class GenericRow(protected[sql] val values: Array[Any]) extends Row
- class GenericMutableRow(size: Int) extends GenericRow(size) with MutableRow
- class RowOrdering(ordering: Seq[SortOrder]) extends Ordering[Row]
- case class ScalaUdf(function: AnyRef, dataType: DataType, children: Seq[Expression])
- case class SortOrder(child: Expression, direction: SortDirection) extends Expression
- abstract class MutableValue extends Serializable
- final class MutableInt extends MutableValue
- final class MutableFloat extends MutableValue
- final class MutableBoolean extends MutableValue
- final class MutableDouble extends MutableValue
- final class MutableShort extends MutableValue
- final class MutableLong extends MutableValue
- final class MutableByte extends MutableValue
- final class MutableAny extends MutableValue
- final class SpecificMutableRow(val values: Array[MutableValue]) extends MutableRow
- case class WrapDynamic(children: Seq[Attribute]) extends Expression
- class DynamicRow(val schema: Seq[Attribute], values: Array[Any])
- abstract class AggregateExpression extends Expression
- case class SplitEvaluation(
- abstract class PartialAggregate extends AggregateExpression
- case class Min(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
- case class MinFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
- case class Max(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
- case class MaxFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
- case class Count(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
- case class CountDistinct(expressions: Seq[Expression]) extends PartialAggregate
- case class CollectHashSet(expressions: Seq[Expression]) extends AggregateExpression
- case class CollectHashSetFunction(
- case class CombineSetsAndCount(inputSet: Expression) extends AggregateExpression
- case class CombineSetsAndCountFunction(
- case class ApproxCountDistinctPartition(child: Expression, relativeSD: Double)
- case class ApproxCountDistinctMerge(child: Expression, relativeSD: Double)
- case class ApproxCountDistinct(child: Expression, relativeSD: Double = 0.05)
- case class Average(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
- case class Sum(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
- case class SumDistinct(child: Expression)
- case class First(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
- case class Last(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
- case class AverageFunction(expr: Expression, base: AggregateExpression)
- case class CountFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
- case class ApproxCountDistinctPartitionFunction(
- case class ApproxCountDistinctMergeFunction(
- case class SumFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
- case class SumDistinctFunction(expr: Expression, base: AggregateExpression)
- case class CountDistinctFunction(
- case class FirstFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
- case class LastFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
- case class UnaryMinus(child: Expression) extends UnaryExpression
- case class Sqrt(child: Expression) extends UnaryExpression
- abstract class BinaryArithmetic extends BinaryExpression
- case class Add(left: Expression, right: Expression) extends BinaryArithmetic
- case class Subtract(left: Expression, right: Expression) extends BinaryArithmetic
- case class Multiply(left: Expression, right: Expression) extends BinaryArithmetic
- case class Divide(left: Expression, right: Expression) extends BinaryArithmetic
- case class Remainder(left: Expression, right: Expression) extends BinaryArithmetic
- case class MaxOf(left: Expression, right: Expression) extends Expression
- case class Abs(child: Expression) extends UnaryExpression
- abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Logging
- protected case class EvaluatedExpression(
- implicit class Evaluate1(e: Expression)
- implicit class Evaluate2(expressions: (Expression, Expression))
- val q"class $orderingName extends $orderingType
- class SpecificOrdering extends Ordering[Row]
- class $orderingName extends $orderingType
- final class SpecificRow(i: $rowType) extends $mutableRowType
- case class GetItem(child: Expression, ordinal: Expression) extends Expression
- case class GetField(child: Expression, fieldName: String) extends UnaryExpression
- abstract class Generator extends Expression
- case class Explode(attributeNames: Seq[String], child: Expression)
- case class Literal(value: Any, dataType: DataType) extends LeafExpression
- case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
- case class ExprId(id: Long)
- abstract class NamedExpression extends Expression
- abstract class Attribute extends NamedExpression
- case class Alias(child: Expression, name: String)
- case class AttributeReference(name: String, dataType: DataType, nullable: Boolean = true)
- case class Coalesce(children: Seq[Expression]) extends Expression
- case class IsNull(child: Expression) extends Predicate with trees.UnaryNode[Expression]
- case class IsNotNull(child: Expression) extends Predicate with trees.UnaryNode[Expression]
- abstract class Projection extends (Row => Row)
- abstract class MutableProjection extends Projection
- trait Predicate extends Expression
- trait PredicateHelper
- abstract class BinaryPredicate extends BinaryExpression with Predicate
- case class Not(child: Expression) extends UnaryExpression with Predicate
- case class In(value: Expression, list: Seq[Expression]) extends Predicate
- case class And(left: Expression, right: Expression) extends BinaryPredicate
- case class Or(left: Expression, right: Expression) extends BinaryPredicate
- abstract class BinaryComparison extends BinaryPredicate
- case class EqualTo(left: Expression, right: Expression) extends BinaryComparison
- case class EqualNullSafe(left: Expression, right: Expression) extends BinaryComparison
- case class LessThan(left: Expression, right: Expression) extends BinaryComparison
- case class LessThanOrEqual(left: Expression, right: Expression) extends BinaryComparison
- case class GreaterThan(left: Expression, right: Expression) extends BinaryComparison
- case class GreaterThanOrEqual(left: Expression, right: Expression) extends BinaryComparison
- case class If(predicate: Expression, trueValue: Expression, falseValue: Expression)
- case class CaseWhen(branches: Seq[Expression]) extends Expression
- case class NewSet(elementType: DataType) extends LeafExpression
- case class AddItemToSet(item: Expression, set: Expression) extends Expression
- case class CombineSets(left: Expression, right: Expression) extends BinaryExpression
- case class CountSet(child: Expression) extends UnaryExpression
- trait StringRegexExpression
- trait CaseConversionExpression
- case class Like(left: Expression, right: Expression)
- case class RLike(left: Expression, right: Expression)
- case class Upper(child: Expression) extends UnaryExpression with CaseConversionExpression
- case class Lower(child: Expression) extends UnaryExpression with CaseConversionExpression
- trait StringComparison
- case class Contains(left: Expression, right: Expression)
- case class StartsWith(left: Expression, right: Expression)
- case class EndsWith(left: Expression, right: Expression)
- case class Substring(str: Expression, pos: Expression, len: Expression) extends Expression
- abstract class QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]]
- abstract protected class Strategy extends Logging
- abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType]
- abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging
- case class Statistics(
- abstract class LeafNode extends LogicalPlan with trees.LeafNode[LogicalPlan]
- abstract class UnaryNode extends LogicalPlan with trees.UnaryNode[LogicalPlan]
- abstract class BinaryNode extends LogicalPlan with trees.BinaryNode[LogicalPlan]
- case class ScriptTransformation(
- case class LocalRelation(output: Seq[Attribute], data: Seq[Product] = Nil)
- case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode
- case class Generate(
- case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode
- case class Union(left: LogicalPlan, right: LogicalPlan) extends BinaryNode
- case class Join(
- case class Except(left: LogicalPlan, right: LogicalPlan) extends BinaryNode
- case class InsertIntoTable(
- case class CreateTableAsSelect(
- case class WriteToFile(
- case class Sort(order: Seq[SortOrder], child: LogicalPlan) extends UnaryNode
- case class Aggregate(
- case class Limit(limitExpr: Expression, child: LogicalPlan) extends UnaryNode
- case class Subquery(alias: String, child: LogicalPlan) extends UnaryNode
- case class Sample(fraction: Double, withReplacement: Boolean, seed: Long, child: LogicalPlan)
- case class Distinct(child: LogicalPlan) extends UnaryNode
- case class Intersect(left: LogicalPlan, right: LogicalPlan) extends BinaryNode
- abstract class Command extends LeafNode
- case class NativeCommand(cmd: String) extends Command
- case class SetCommand(key: Option[String], value: Option[String]) extends Command
- case class ExplainCommand(plan: LogicalPlan, extended: Boolean = false) extends Command
- case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)
- case class UncacheTableCommand(tableName: String) extends Command
- case class DescribeCommand(
- abstract class RedistributeData extends UnaryNode
- case class SortPartitions(sortExpressions: Seq[SortOrder], child: LogicalPlan)
- case class Repartition(partitionExpressions: Seq[Expression], child: LogicalPlan)
- case class ClusteredDistribution(clustering: Seq[Expression]) extends Distribution
- case class OrderedDistribution(ordering: Seq[SortOrder]) extends Distribution
- sealed trait Partitioning
- case class UnknownPartitioning(numPartitions: Int) extends Partitioning
- case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int)
- case class RangePartitioning(ordering: Seq[SortOrder], numPartitions: Int)
- abstract class Rule[TreeType <: TreeNode[_]] extends Logging
- abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging
- abstract class Strategy
- case class FixedPoint(maxIterations: Int) extends Strategy
- protected case class Batch(name: String, strategy: Strategy, rules: Rule[TreeType]*)
- abstract class TreeNode[BaseType <: TreeNode[BaseType]]
- trait BinaryNode[BaseType <: TreeNode[BaseType]]
- trait LeafNode[BaseType <: TreeNode[BaseType]]
- trait UnaryNode[BaseType <: TreeNode[BaseType]]
- class TreeNodeRef(val obj: TreeNode[_])
- abstract class DataType
- trait PrimitiveType extends DataType
- abstract class NativeType extends DataType
- abstract class NumericType extends NativeType with PrimitiveType
- abstract class IntegralType extends NumericType
- abstract class FractionalType extends NumericType
- case class ArrayType(elementType: DataType, containsNull: Boolean) extends DataType
- case class StructField(name: String, dataType: DataType, nullable: Boolean)
- case class StructType(fields: Seq[StructField]) extends DataType
- case class MapType(
- implicit class debugLogging(a: Any)
- public class ArrayType extends DataType
- public class BinaryType extends DataType
- public class BooleanType extends DataType
- public class ByteType extends DataType
- public abstract class DataType
- public class DecimalType extends DataType
- public class DoubleType extends DataType
- public class FloatType extends DataType
- public class IntegerType extends DataType
- public class LongType extends DataType
- public class MapType extends DataType
- public class ShortType extends DataType
- public class StringType extends DataType
- public class StructField
- public class StructType extends DataType
- public class TimestampType extends DataType
- class SQLContext(@transient val sparkContext: SparkContext)
- * case class Person(name: String, age: Int)
- protected[sql] class SparkPlanner extends SparkStrategies
- protected abstract class QueryExecution
- class SchemaRDD(
- class JavaSQLContext(val sqlContext: SQLContext) extends UDFRegistration
- class JavaSchemaRDD(
- class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
- class Decoder[T <: NativeType](buffer: ByteBuffer, columnType: NativeColumnType[T])
- class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
- class Decoder[T <: NativeType](buffer: ByteBuffer, columnType: NativeColumnType[T])
- class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T]
- class Decoder[T <: NativeType](buffer: ByteBuffer, columnType: NativeColumnType[T])
- class Encoder extends compression.Encoder[BooleanType.type]
- class Decoder(buffer: ByteBuffer) extends compression.Decoder[BooleanType.type]
- class Encoder extends compression.Encoder[IntegerType.type]
- class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])
- class Encoder extends compression.Encoder[LongType.type]
- class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])
- case class Aggregate(
- case class ComputedAggregate(
- case class Exchange(newPartitioning: Partitioning, child: SparkPlan) extends UnaryNode
- case class LogicalRDD(output: Seq[Attribute], rdd: RDD[Row])(sqlContext: SQLContext)
- case class PhysicalRDD(output: Seq[Attribute], rdd: RDD[Row]) extends LeafNode
- case class ExistingRdd(output: Seq[Attribute], rdd: RDD[Row]) extends LeafNode
- case class SparkLogicalPlan(alreadyPlanned: SparkPlan)(@transient sqlContext: SQLContext)
- case class Generate(
- case class AggregateEvaluation(
- case class GeneratedAggregate(
- class QueryExecutionException(message: String) extends Exception(message)
- abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializable
- case class CommandStrategy(context: SQLContext) extends Strategy
- case class Project(projectList: Seq[NamedExpression], child: SparkPlan) extends UnaryNode
- case class Filter(condition: Expression, child: SparkPlan) extends UnaryNode
- case class Sample(fraction: Double, withReplacement: Boolean, seed: Long, child: SparkPlan)
- case class Union(children: Seq[SparkPlan]) extends SparkPlan
- case class Limit(limit: Int, child: SparkPlan)
- case class TakeOrdered(limit: Int, sortOrder: Seq[SortOrder], child: SparkPlan) extends UnaryNode
- case class Sort(
- case class Distinct(partial: Boolean, child: SparkPlan) extends UnaryNode
- case class Except(left: SparkPlan, right: SparkPlan) extends BinaryNode
- case class Intersect(left: SparkPlan, right: SparkPlan) extends BinaryNode
- case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan
- trait Command
- case class SetCommand(
- case class ExplainCommand(
- case class CacheTableCommand(
- case class UncacheTableCommand(tableName: String) extends LeafNode with Command
- case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(
- implicit class DebugQuery(query: SchemaRDD)
- case class ColumnMetrics(
- trait HashJoin
- case class HashOuterJoin(
- case class ShuffledHashJoin(
- case class LeftSemiJoinHash(
- case class BroadcastHashJoin(
- case class LeftSemiJoinBNL(
- case class CartesianProduct(left: SparkPlan, right: SparkPlan) extends BinaryNode
- case class BroadcastNestedLoopJoin(
- case class EvaluatePython(udf: PythonUDF, child: LogicalPlan) extends logical.UnaryNode
- case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan)
- case class ParquetTableScan(
- case class InsertIntoParquetTable(
- protected case class Keyword(str: String)
- class LocalHiveContext(sc: SparkContext) extends HiveContext(sc)
- class HiveContext(sc: SparkContext) extends SQLContext(sc)
- protected[sql] abstract class QueryExecution extends super.QueryExecution
- implicit class typeInfoConversions(dt: DataType)
- implicit class SchemaAttribute(f: FieldSchema)
- implicit class TransformableNode(n: ASTNode)
- class ParseException(sql: String, cause: Throwable)
- class SemanticException(msg: String)
- implicit class LogicalPlanHacks(s: SchemaRDD)
- implicit class PhysicalPlanHacks(originalPlan: SparkPlan)
- case class HiveCommandStrategy(context: HiveContext) extends Strategy
- class HadoopTableReader(
- class TestHiveContext(sc: SparkContext) extends HiveContext(sc)
- protected[hive] class HiveQLQueryExecution(hql: String) extends this.QueryExecution
- abstract class QueryExecution extends super.QueryExecution
- case class TestTable(name: String, commands: (()=>Unit)*)
- protected[hive] implicit class SqlCmd(sql: String)
- class JavaHiveContext(sparkContext: JavaSparkContext) extends JavaSQLContext(sparkContext)
- case class CreateTableAsSelect(
- case class DescribeHiveTableCommand(
- case class HiveTableScan(
- case class InsertIntoHiveTable(
- assert(valueClass != null, "Output value class not set")
- assert(outputFileFormatClassName != null, "Output format class not set")
- case class NativeCommand(
- case class ScriptTransformation(
- case class AnalyzeTable(tableName: String) extends LeafNode with Command
- case class DropTable(tableName: String, ifExists: Boolean) extends LeafNode with Command
- case class AddJar(path: String) extends LeafNode with Command
- class DeferredObjectAdapter extends DeferredObject
- protected class UDTFCollector extends Collector
- class FakeParquetSerDe extends SerDe
- case class ParameterizedType(override val name: String,
- case class SparkMethod(name: String, returnType: SparkType, parameters: Seq[SparkType])
- class ExecutorRunnable(
- logInfo("Interrupting user class to stop.")
- class ApplicationMasterArguments(val args: Array[String])
- " --class CLASS_NAME Name of your application's main class (required) " +
- trait ExecutorRunnableUtil extends Logging
- protected trait YarnAllocateResponse
- trait YarnRMClient
- class YarnSparkHadoopUtil extends SparkHadoopUtil
- class ExecutorRunnable(
- class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])])

AmplabJenkins · 2014-10-21T19:30:57Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21997/
Test PASSed.

SparkQA · 2014-10-21T21:24:48Z

QA tests have started for PR 2667 at commit d881097.

This patch merges cleanly.

SparkQA · 2014-10-21T22:37:56Z

QA tests have finished for PR 2667 at commit d881097.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])])

AmplabJenkins · 2014-10-21T22:37:59Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22009/
Test PASSed.

mengxr · 2014-10-21T22:46:37Z

LGTM. Merged into master. Thanks!

add ranking metrics

3a5a6ff

srowen reviewed Oct 6, 2014
View reviewed changes

change style and use alternative builtin methods

e443fee

mengxr reviewed Oct 7, 2014
View reviewed changes

Use generic type to represent IDs

b7851cc

add API to calculate precision and ndcg at each ranking position

5f87bce

mengxr reviewed Oct 15, 2014
View reviewed changes

set the precision of empty labset to 0.0

be6645e

mengxr reviewed Oct 21, 2014
View reviewed changes

coderxiang added 4 commits October 21, 2014 09:40

improve doc and remove unnecessary computation while labSet is empty

f626896

modify doc for displaying superscript and subscript

f113ee1

style change and remove ignored files

d7fb93f

remove unexpected files

14d9cd9

update doc

d881097

asfgit closed this in 814a9cd Oct 21, 2014

		import org.apache.spark.rdd.RDD

SPARK-3568 [mllib] add ranking metrics #2667

SPARK-3568 [mllib] add ranking metrics #2667

Conversation

coderxiang commented Oct 6, 2014

SparkQA commented Oct 6, 2014

SparkQA commented Oct 6, 2014

AmplabJenkins commented Oct 6, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 6, 2014

SparkQA commented Oct 6, 2014

AmplabJenkins commented Oct 6, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mengxr commented Oct 7, 2014

srowen commented Oct 8, 2014

mengxr commented Oct 8, 2014

srowen commented Oct 8, 2014

mengxr commented Oct 8, 2014

AmplabJenkins commented Oct 9, 2014

coderxiang commented Oct 10, 2014

SparkQA commented Oct 10, 2014

SparkQA commented Oct 10, 2014

AmplabJenkins commented Oct 10, 2014

Choose a reason for hiding this comment

SparkQA commented Oct 20, 2014

SparkQA commented Oct 20, 2014

AmplabJenkins commented Oct 20, 2014

SparkQA commented Oct 20, 2014

AmplabJenkins commented Oct 20, 2014

Choose a reason for hiding this comment

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

SparkQA commented Oct 21, 2014

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

mengxr commented Oct 21, 2014